{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CTW dataset tutorial (Part 2: classification baseline)\n",
    "\n",
    "In this part of the turotial, we will show you:\n",
    "\n",
    "  - [Framework of classification baseline](#Framework-of-classification-baseline)\n",
    "  - [Training steps](#Training-steps)\n",
    "  - [Predicting steps](#Training-steps)\n",
    "  - [Submission format](#Submission-format)\n",
    "  - [Evaluation API](#Evaluation-API)\n",
    "  - [Evaluate results locally](#Evaluate-results-locally)\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > This notebook MUST be run under `$CTW_ROOT/tutorial`.\n",
    "  >\n",
    "  > All the code SHOULD be run with `Python>=3.4`. We make it compatible with `Python>=2.7` with best effort."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Framework of classification baseline\n",
    "\n",
    "We regard the text recoginition problem as a classification problem. We train models using [Tensorflow](https://www.tensorflow.org/).\n",
    "\n",
    "We only consider recognition of the top 1000 frequent observed character categories. We give up to recognize other categories, which must leads a failure on those categories.\n",
    "\n",
    "The _magic number_ `1000` is set in `classification/settings.py`. You may modify it if you want to train with another number of categories.\n",
    "\n",
    "For each character instance, we take following operations.\n",
    "\n",
    "  1. crop the image region around it\n",
    "  1. (training step only) randomly adjust saturation, brightness, contrast\n",
    "  1. (training step only) randomly apply an affine transform\n",
    "  1. per instance standardization \n",
    "  1. resize to fit the input of classification models\n",
    "  1. feed to each classification model\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training steps\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > Before you run any scripts, please ensure you have requirements installed, which are described in `Tutorial part-1: Basics`.\n",
    "  >\n",
    "  > We train models on a desktop with 32 GB RAM. If your RAM is less than 32 GB, some code may fail.\n",
    "  >\n",
    "  > We train models on NVIDIA GTX TITAN X, which GPU memory is 12 GB. If your GPU memory is less than 12 GB, you may need to turn down `batch_size` in `cfgs` in `classification/train.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Decide categories\n",
    "\n",
    "Decide which categories are the top 1000 frequent observed character categories, and save to `products/cates.json`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../classification && python3 decide_cates.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create pickles\n",
    "\n",
    "Crop the image region around character instances, and save pickles to `products/*.pkl`, in order to avoid frequently reading `.jpg` files.\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > Due to `pickle` module is not fully compatible between Python 2 and Python 3, we restrict each python script using pickle to use Python 3 by an assertion `assert six.PY3`. You MAY change all of them to `assert six.PY2` if needed, but you SHOULD NOT make a mixture use of Python 2 pickle and Python 3 pickle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!cd ../classification && python3 create_pkl.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run train scripts\n",
    "\n",
    "Run train script with command line argument `alexnet_v2` to train `AlexNet v2`. Other choices are `overfeat`, `inception_v4`, `resnet_v2_50` and `res_net_v2_152`.\n",
    "\n",
    "Train logs and checkpoints are saved to `classification/products/train_logs_alexnet_v2/`. You can run `tensorboard` to see detailed logs.\n",
    "\n",
    "This script outputs a mount of logs and takes a long time, so we recommand you to run it with `/bin/bash` instead of running it directly in jupyter notebook.\n",
    "\n",
    "Time cost estimation (NVIDIA GTX TITAN X):\n",
    "\n",
    "  - **alexnet_v2**: 0.2 sec / step, 6 hours in total\n",
    "  - **resnet_v2_152**: 1.2 sec / step, 33 hours in total\n",
    "  - **others**: 1.0 sec / step, 28 hours in total\n",
    "\n",
    "You can browse train logs using tensorboard during training: `tensorboard --logdir=../classification/products/train_logs_alexnet_v2/`.\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > If training step become slower and slower (e.g., > 2 sec / step), you can just press Ctrl+C to interrupt it, execute `sudo sh -c \"echo 3 >/proc/sys/vm/drop_caches\"` to drop caches, and rerun `train.py`. It will automatically resume training from the latest checkpoint. We save checkpoints per 1200 seconds, this magic number is set in `save_interval_secs` in `cfg_common` in `classification/train.py`. Do not run it along with other memory intensive applications.\n",
    "  >\n",
    "  > If you get a `CUDNN_STATUS_BAD_PARAM` error, you may turn down `per_process_gpu_memory_fraction` in `classification/train.py`.\n",
    "  >\n",
    "  > When training `resnet_v2_152`, tensorflow may run training step and summary step at the same time and lead to running out of memory (OOM). You may set `save_summaries_secs` in `cfgs` in `classification/train.py` to infinity (e.g., 999999) to disable summary step.\n",
    "  >\n",
    "  > You can modify `cfgs` in `classification/train.py` to add or delete models. All avaliable models are described in `classification/slim/nets/nets_factory.py`, but we have not tested whether every models are suitable to our dataset.\n",
    "  >\n",
    "  > You can update TensorFlow-Slim from [source](https://github.com/tensorflow/models/tree/master/research/slim), but please keep our customized modified `slim/train_image_classifier.py` and `slim/eval_image_classifier.py`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../classification && python3 train.py alexnet_v2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download trained models\n",
    "\n",
    "Since training takes a lot of energy and we hate global warming, we provide checkpoints which are trained using TRAIN+VAL.\n",
    "\n",
    "Visit our homepage (https://ctwdataset.github.io/) and gain access to the checkpoints.\n",
    "\n",
    "Notes:\n",
    "\n",
    "> If you are using trained models without training steps, you need to run `python3 decide_cates.py` (see *[Decide categories](#Decide-categories)* section) with TRAIN+VAL to generate `cates.json`, the map from label ID to character category."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predicting steps\n",
    "\n",
    "Just like [training steps](#Training-steps), only need to substitute `train.py` with `eval.py`.\n",
    "\n",
    "This step will feed each character instance in classification testing set to the model, and save the output end point (so called _logits_) of the model to `classification/products/eval_alexnet_v2.pkl`.\n",
    "\n",
    "Then, for each character instance, we sort the `logits`, and output the Top-5 candicates for each character instance to `classification/products/predictions_alexnet_v2.jsonl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!cd ../classification && python3 eval.py alexnet_v2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Submission format\n",
    "\n",
    "Classification results MUST be UTF-8 encoded [JSON Lines](http://jsonlines.org/), each line MUST match the corresponding line in `Classification testing set annotations` (`$CTW_ROOT/data/annotations/test_cls.jsonl`), which is described in `Tutorial part-1: Basics`.\n",
    "\n",
    "```\n",
    "result (corresponding to one line in .jsonl):\n",
    "{\n",
    "    predictions: [prediction_0, prediction_1, prediction_2, ...],\n",
    "}\n",
    "\n",
    "prediction:\n",
    "[candidate_0, candidate_1, candidate_2, ...]  # there MUST be at least 5 candidates, in order to compute top-1 and top-5 accuracy\n",
    "\n",
    "candidate: str                                # length is usually 1, otherwise this prediction must be a false negative\n",
    "```\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > Since our evaluation API computes both top-1 accuracy and top-5 accuracy, you MUST provide at least 5 candidates for each character instance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation API\n",
    "\n",
    "Our evaluation API in `pythonapi/eval_tools.py` works as follows.\n",
    "\n",
    "  1. Check prediction (PR) file contains exactly true number of lines. Otherwise, return error.\n",
    "  2. Check each line of the PR file is valid JSON, and conform to the submission format. Otherwise, return error.\n",
    "  3. Check PR contains exactly true number of predictions. Otherwise, return error.\n",
    "  4. Count the number of instances and the number of matched instances for each attribute combination and each size, respectively.\n",
    "\n",
    "If no error occurred, the data struct for the output of evaluation API is described below.\n",
    "\n",
    "```\n",
    "output:\n",
    "{\n",
    "    error: 0,\n",
    "    performance: {\n",
    "      all: size_performance,\n",
    "      large: size_performance,\n",
    "      medium: size_performance,\n",
    "      small: size_performance,\n",
    "    }\n",
    "}\n",
    "\n",
    "size_performance:\n",
    "{\n",
    "    attributes: [recalls_0, recalls_1, recalls_2, ..., recalls_63],        # recalls for each attribute\n",
    "    texts: {str_0: recalls_0, str_1: recalls_1, str_2: recalls_2, ...},    # recalls for each character category\n",
    "}\n",
    "\n",
    "recalls:\n",
    "{\n",
    "    n: int,        # count of instances for specified attribute combination and specified size range\n",
    "    recalls: {\n",
    "      1: int,      # count of matched instances for specified attribute combination and specified size range\n",
    "      5: int,      # count of matched instances for specified attribute combination and specified size range\n",
    "    }\n",
    "}  \n",
    "```\n",
    "\n",
    "The number `k` in `recalls_k` is represented in bitwise. e.g., `k = 5` (`000101` in binary) means:\n",
    "\n",
    "| Attribute | Yes or no |\n",
    "| --------: | :-------- |\n",
    "| occluded  | 1 |\n",
    "| bgcomplex | 0 |\n",
    "| distorted | 1 |\n",
    "| raised    | 0 |\n",
    "| wordart   | 0 |\n",
    "| handwritten | 0 |\n",
    "\n",
    "This is corresponding to character instances with attributes combination `occluded & ~bgcomplex & distorted & ~raised & ~wordart & ~handwritten`.\n",
    "\n",
    "The data struct for the output of evaluation server is described below.\n",
    "\n",
    "```\n",
    "evaluation server output:\n",
    "{\n",
    "    attributes: list,     # the configure of considered attributes, always [\"occluded\", \"bgcomplex\", etc.]\n",
    "    size_ranges: list,    # the configure of considered sizes on the evaluation server, defined in `codalab/settings.py`\n",
    "    recall_n: list,       # always [1, 5], indicates we mesure top-1 accuracy and top-5 accuraty\n",
    "    performance: {\n",
    "        all: size_performance,\n",
    "        large: size_performance,\n",
    "        medium: size_performance,\n",
    "        small: size_performance,\n",
    "    },\n",
    "}\n",
    "```\n",
    "\n",
    "The `size_performance` in the output of evaluation server slightly differs from the output of our evaluation API.\n",
    "\n",
    "  - `texts` in `size_performance` only contains top-10 frequent categories, to avoid revealing the frequency of each category on testing set, and to reduce the size of output file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate results locally\n",
    "\n",
    "If you don't have ground truth of testing set, you cannot run following scripts."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Gather statistics\n",
    "\n",
    "Run this script to gather statistics. We will output the following files in this step:\n",
    "\n",
    "  - `judge/products/stat_frequency.json`: the frequency in the whole dataset, both training set and testing set\n",
    "  - `judge/products/plots/stat_attributes.pdf`: (described in our paper)\n",
    "  - `judge/products/plots/stat_instance_size.pdf`: (described in our paper)\n",
    "  - `judge/products/plots/stat_most_freq.pdf`: (described in our paper)\n",
    "  - `judge/products/plots/stat_num_char.pdf`: (described in our paper)\n",
    "  - `judge/products/plots/stat_num_uniq_char.pdf`: (described in our paper)\n",
    "\n",
    "Notes:\n",
    "\n",
    "> This step requires network connection to download `SimHei.ttf`, i.e.,a Chinese font file. This font file is used in rendering chinese in our statistics, and some other results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../judge && python3 statistics_in_paper.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluate classification performance\n",
    "\n",
    "We will output the following files in this step:\n",
    "\n",
    "  - `<stdout>`: classification performance of each attribute\n",
    "  - `<stdout>`: classification performance for top-10 most frequent character categories\n",
    "  - `judge/products/plots/cls_precision_by_attr_size_(model_name).pdf`: (described in our paper)\n",
    "  - `judge/products/plots/cls_precision_by_model_size.pdf`: performance for each model and each size\n",
    "  - `judge/products/explore_cls.html`: performance for each model, each conbination of attributes and each size\n",
    "\n",
    "If you run `classification_perf.py` without command line arguments, it will evaluate all models listed in `cfgs` in `predictions2html.py`.\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > You may result in a higher performance than paper. The reason may be our models are trained on TRAIN+VAL, while you are using validation set as testing set now.\n",
    "  >\n",
    "  > This step requires network connection to download `Chart.min.js` used to generate `explore_cls.html`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../judge && python3 classification_perf.py alexnet_v2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Show results of character instances\n",
    "\n",
    "Besides `predictions`, our baseline also output `probabilities` to results to enable showing confidence probabilities for each prediction.\n",
    "\n",
    "This step will output:\n",
    "\n",
    "  - `judge/products/test_cls_cropped.pkl`: a cache file to avoid frequently reading .jpg files\n",
    "  - `judge/products/predictions_compare.html`: (described in our paper)\n",
    "\n",
    "If you run `predictions2html.py` without command line arguments, it will evaluate all models listed in `cfgs`.\n",
    "\n",
    "Then, you can browse `judge/products/predictions_compare.html`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd ../judge && python3 predictions2html.py alexnet_v2"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
